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Abstract. Suppose that a target function /o : K d — ► R is monotonic, namely, weakly 
increasing, and an original estimate / of the target function is available, which is not 
weakly increasing. Many common estimation methods used in statistics produce such 
estimates /. We show that these estimates can always be improved with no harm using 
rearrangement techniques: The rearrangement methods, univariate and multivariate, 
transform the original estimate to a monotonic estimate /* , and the resulting estimate 
is closer to the true curve /o in common metrics than the original estimate /. We 
illustrate the results with a computational example and an empirical example dealing 
with age-height growth charts. 
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1. Introduction 

A common problem in statistics is to approximate an unknown monotonic function 
on the basis of available samples. For example, biometric age-height charts should be 
monotonic in age; econometric demand functions should be monotonic in price; and 
quantile functions should be monotonic in the probability index. Suppose an original, 
possibly non- monotonic, estimate is available. Then, the rearrangement operation from 
variational analysis (Hardy, Littlewood, and Polya 1952, Lorentz 1953, Villani 2003) 
can be used to monotonize the original estimate. The rearrangement has been shown 
to be useful in producing monotonized estimates of conditional mean functions (Dette, 
Neumeyer, and Pilz 2006, Dette and Pilz 2006) and various conditional quantile and 
probability functions (Chernozhukov, Fernandez- Val, and Galichon (2006a, 2006b)). In 
this paper, it is shown that the rearrangement of the original estimate is useful not 
only for producing monotonicity, but also has the following important property: The 
rearrangement always improves over the original estimate, whenever the latter is not 
monotonic. Namely, the rearranged curves are always closer (often considerably closer) 
to the target curve being estimated. Furthermore, this improvement property is generic, 
i.e. it does not depend on the underlying specifics of the original estimate and applies 
to both univariate and multivariate cases. 

The paper is organized as follows. In Section 2.1, we motivate the monotonicity 
issue in regression problems, and discuss common estimates/ approximations of regression 
functions that are not naturally monotonic. In Section 2.2, we analyze the improvements 
in estimation/approximation properties that the rearranged estimates deliver. In Section 
2.3, we discuss the computation of the rearrangement, using sorting and simulation. In 
Section 2.4, we extend the analysis of Section 2.2 to multivariate functions. In Section 3, 
we provide proofs of the main results. In Section 4, we present an empirical application to 
biometric age-height charts. We show how the rearrangement monotonizes and improves 
the original estimates of the conditional mean function in this example, and quantify 
the improvement in a simulation example resembling the empirical one. In the same 
section, we also analyze estimation of conditional quantile processes for height given age 
that need to be monotonic in both age and the quantile index. We apply a multivariate 
rearrangement to doubly monotonize the estimates both in age and the quantile index. 
We show that the rearrangement monotonizes and improves the original estimates, and 
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quantify the improvement in a simulation example mimicking the empirical example. In 
Section 5 we offer a summary and a conclusion. 

2. Improving Approximations of Monotonic Functions 

2.1. Common Estimates of Monotonic Functions. A basic problem in many ar- 
eas of analysis is to approximate an unknown function f : M. d — > K. on the basis of 
some available information. In statistics, the common problem is to approximate an 
unknown regression function, such as the conditional mean or a conditional quantile, us- 
ing an available sample. In numerical analysis, the common problem is to approximate 
an intractable target function by a more tractable function on the basis of the target 
function's values at a collection of points. 

Suppose we know that the target function f is monotonic, namely weakly increasing. 
Suppose further that an original estimate / is available, which is not necessarily mono- 
tonic. Many common estimation methods do indeed produce such estimates. Can these 
estimates always be improved with no harm? The answer provided by this paper is yes: 
the rearrangement method transforms the original estimate to a monotonic estimate 
/*, and this estimate is in fact closer to the true curve fo than the original estimate / 
in common metrics. Furthermore, the rearrangement is computationally tractable, and 
thus preserves the computational appeal of the original estimates. 

Estimation methods, specifically the ones used in regression analysis, can be grouped 
into global methods and local methods. An example of a global method is the series 
estimator of fo taking the form 

f(x) = P kn (x)'b, 

where Pk n { x ) is a k n - vector of suitable transformations of the variable x, such as B- 
splines, polynomials, and trigonometric functions. Section 4 lists specific examples in 
the context of an empirical example. The estimate b is obtained by solving the regression 
problem 

n 

b = arg min Vp(F i - P kn (X t )'b), 

where (Y^, Xj),i = 1, n denotes the data. In particular, using the square loss p(u) = u 2 
produces estimates of the conditional mean of Yj given Xi (Gallant 1981, Andrews 
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1991, Stone 1994, Newey 1997), while using the asymmetric absolute deviation loss 
p(u) — (u — l(u < 0))u produces estimates of the conditional -u-quantile of Yj, given X; L 
(Koenker and Bassett 1978, Portnoy 1997, He and Shao 2000). Likewise, in numerical 
analysis "data" often consist of values Y { of a target function evaluated at a collection 
of mesh points {X iy i = l,,n} and the mesh points themselves. The series estimates 
x i— > f(x) = Pk n (x)'b are widely used in data analysis due to their good approximation 
properties and computational tractability. However, these estimates need not be natu- 
rally monotone, unless explicit constraints are added into the optimization program (for 
example, Matzkin (1994), Silvapulle and Sen (2005), and Koenker and Ng (2005)). 

Examples of local methods include kernel and locally polynomial estimators. A kernel 
estimator takes the form 

n 

f(x) = argmin V] Wip(Yi -b), Wi = K 

i=i 

where the loss function p plays the same role as above, K{u) is a standard, possibly 
high-order, kernel function, and h > is a vector of bandwidths (see, for example, 
Wand and Jones (1995) and Ramsay and Silverman (2005)). The resulting estimate 
x i— > f(x) needs not be naturally monotone. Dette, Neumeyer, and Pilz (2006) show 
that the rearrangement transforms the kernel estimate into a monotonic one. We further 
show here that the rearranged estimate necessarily improves upon the original estimate, 
whenever the latter is not monotonic. The locally polynomial regression is a related 
local method (Chaudhuri 1991, Fan and Gijbels 1996). In particular, the locally linear 
estimator takes the form 

n 

(f(x), d(x)) = argmin V] Wip(Yi - b - d(Xi - x)) 2 , Wi = K 

The resulting estimate x i— > f(x) may also be non-monotonic, unless explicit constrains 
are added to the optimization problem. Section 4 illustrates the non-monotonicity of 
the locally linear estimate in an empirical example. 

In summary, there are many attractive estimation and approximation methods in sta- 
tistics that do not necessarily produce monotonic estimates. These estimates do have 
other attractive features though, such as good approximation properties and computa- 
tional tractability. Below we show that the rearrangement operation applied to these 
estimates produces (monotonic) estimates that improve the approximation properties of 
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the original estimates by bringing them closer to the target curve. Furthermore, the re- 
arrangement is computationally tractable, and thus preserves the computational appeal 
of the original estimates. 



2.2. The Rearrangement and its Approximation Property: The Univariate 
Case. In what follows, let X be a compact interval. Without loss of generality, it is 
convenient to take this interval to be X = [0,1]. Let f(x) be a measurable function 
mapping X to K, a bounded subset of R. Let Ff(y) = j x l{f(u) < y}du denote the 
distribution function of f(X) when X follows the uniform distribution on [0,1]. Let 

f*(x) := Q f (x) := M{y G R : F f (y) > x} 

be the quantile function of Ff(y). Thus, 



f*(x) :=inf lye 



!{/(«) < y}du 



x 



> x 



This function /* is called the increasing rearrangement of the function /. 

Thus, the rearrangement operator simply transforms a function / to its quantile func- 
tion /*. That is, x i— > f*(x) is the quantile function of the random variable f(X) when 
X ~ U(0, 1). It is also convenient to think of the rearrangement as a sorting operation: 
given values of the function f(x) evaluated at x in a fine enough net of equidistant 
points, we simply sort the values in an increasing order. The function created in this 
way is the rearrangement of /. 

The main point of this paper is the following: 

Proposition 1. Let f :X^Kbea weakly increasing measurable function in x, where 
K is a bounded subset of R. This is the target function. Let f : X — > K be another 
measurable function, an initial estimate of the target function f . 

1. For any p G [1, oo], the rearrangement of f, denoted f* , weakly reduces the estimation 
error: 
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(2.1) 



2. Suppose that there exist regions X and Xq, each of measure greater than 5 > 0, such 
that for all x G X and x' G Xq we have that (i) x' > x, (ii) f(x) > f(x') + e, and (Hi) 



fo(x') > fo(x) + e, for some e > 0. Then the gain in the quality of approximation is 
strict for p G (1, oo). Namely, for any p G [1, oo] ; 



x 



f*(x)-f (x) 



dx 



i/p 



< 



X 



f{x) - fo(x) 



dx — 5r] p 



i/p 



(2.2) 



where rj p = mf{\v — t'\ p + \v' — t\ p 



t\ p — \v' — t'\ p } and T} p > for p G (1, oo), with 



the infimum taken over all v, v', t, t' in the set K such that v' > v + e and t' >t + e. 



The first part of the proposition states the weak inequality ( 12. lj) . and the second part 
states the strict inequality (|2.2p . For example, the inequality is strict for p G (l,oo) if 
the original estimate f(x) is decreasing on a subset of X having positive measure, while 
the target function fo(x) is increasing on X (by increasing, we mean strictly increasing 
throughout). Of course, if fo(x) is constant, then the inequality (12.11) becomes an equal- 
ity, as the distribution of the rearranged function /* is the same as the distribution of 
the original function /, that is Fj t = Ft. 

This proposition establishes that the rearranged estimate /* has a smaller estimation 
error in the L p norm than the original estimate whenever the latter is not monotone. 
This is a very useful and generally applicable property that is independent of the sample 
size and of the way the original estimate / is obtained. 

An indirect proof of the weak inequality ( 12. ID is a simple but important consequence 
of the following classical inequality due to Lorentz (1953): Let q and g be two functions 
mapping X to K, a bounded subset of M. Let q* and g* denote their corresponding 
increasing rearrangements. Then, 

/ L(q*(x),g*(x),x)dx < / L(q(x), g(x), x)dx, 

J X J X 

for any submodular discrepancy function L : R 3 i— > R. Set q(x) = f{x), q*(x) = 
f*(x), g(x) = fo(x), and g*(x) = fo(x). Now, note that in our case fo(x) = fo(x) 
almost everywhere, that is, the target function is its own rearrangement. Moreover, 
L(v,w,x) = \w — v\ p is submodular for p G [l,oo). This proves the first part of the 
proposition above. For p = oo, the first part follows by taking the limit as p — ► oo. 

In Section 3 we provide a proof of the strong inequality (12. 2p as well as the direct proof 
of the weak inequality (12. ip . The direct proof illustrates how reductions of the estimation 



error arise from even a partial sorting of the values of the estimate /. Moreover, the 
direct proof characterizes the conditions for the strict reduction of the estimation error. 

The following immediate implication of the above finite-sample result is also worth 
emphasizing: The rearranged estimate /* inherits the L p rates of convergence from the 
original estimates /. For p G [1, oo], if A n = [J x \fo(x) — f(x)\ p du] 1 / p = Op{a n ) for some 
sequence of constants a n , then [j x \fo(x) — f*{x)\ p du] l l p < \ n = Op{a n ). 

2.3. Computation of the Rearranged Estimate. One of the following methods can 
be used for computing the rearrangement. Let {X j: j = 1, ...,B} be either (1) a net of 
equidistant points in [0, 1] or (2) a sample of i.i.d. draws from the uniform distribution 
on [0,1]. Then the rearranged estimate f*(u) at point u G X can be approximately- 
computed as the -u-quantile of the sample {f(Xj),j = 1, ...,£}. The first method is 
deterministic, and the second is stochastic. Thus, for a given number of draws B, the 
complexity of computing the rearranged estimate f*(u) in this way is equivalent to the 
complexity of computing the sample w-quantile in the sample of size B. 

The number of evaluations B can depend on the problem. Suppose that the den- 
sity function of the random variable f{X), when X ~ U(0, 1), is bounded away from 
zero over a neighborhood of f*(x). Then f*(x) can be computed with the accuracy 
of Op(l/y/B), as B — > oo, where the rate follows from the results of Knight (2002). 
As shown in Chernozhukov, Fernandez-Val, and Galichon (2006a), the density of f{X), 
denoted Fjr(t), exists if f(x) is continuously differentiable and the number of elements 
in {x G X : f'(x) = 0} is bounded; in particular, 

xe{reX:f(r)=t} lJ V Jl 

Thus, the density F'f(t) is bounded away from zero if f'(x) is bounded away from infinity. 
Interestingly, the density has infinite poles at {t G X : there is an x such that f'(x) = 
and f(x) = t}. 

2.4. The Rearrangement and Its Approximation Property: The Multivariate 
Case. In this section, we consider multivariate functions / : X d — > K, where X d = 
[0, l] d and K is a bounded subset of HL The notion of monotonicity we seek to impose 
on / is the following: We say that the function / is weakly increasing in x if f(x') > f(x) 
whenever x' > x. The notation x' = (x[, ...,x' d ) > x = (xi, ...,Xd) means that one vector 



is weakly larger than the other in each of the components, that is, x'j > Xj for each 
j — 1, d. In what follows, we use the notation f(xj, #_,•) to denote the dependence of 
/ on its j-th argument, Xj, and all other arguments, x-j, that exclude Xj. The notion 
of monotonicity above is equivalent to the requirement that for each j in l,...,d the 
mapping Xj h- > f(xj,X-j) is weakly increasing in Xj, for each X-j in X d l . 

Define the rearranged operator Rj and the rearranged function f*(x) with respect to 
the j-th argument as follows: 



f*(x) := R 3 o f(x) := inf L : jf !{./(.,>,• ; ) < y}^ 



> a;. 



This is the one-dimensional increasing rearrangement applied to one-dimensional func- 
tion Xj i— > f(xj,X-j), holding the other arguments x_j fixed. The rearrangement is 
applied for every value of the other arguments X-j. 

Let ii = (7Ti, ...,iTd) be an ordering, i.e. a permutation, of the integers 1, ...,d. Let us 
define the 7r-rearrangement operator R w and the ^-rearranged function f*(x) as follows: 

f*(x) := R w o f(x) := R Wl o ... o R Kd o /(x). 

For any ordering n, the 7r-rearrangement operator rearranges the function with respect 
to all of its arguments. As shown below, the resulting function f n (x) is weakly increasing 
in x. 

In general, two different orderings n and tt' of l,...,d can yield different rearranged 
functions f*(x) and f*,(x). Therefore, to resolve the conflict among rearrangements 
done with different orderings, we may consider averaging among them: letting II be any 
collection of distinct orderings n, we can define the average rearrangement as 

n^-jni (2 - 4) 

where |I1| denotes the number of elements in the set of orderings II. As shown below, the 
approximation error of the average rearrangement is weakly smaller than the average of 
approximation errors of individual ^-rearrangements. 

The following proposition describes the properties of multivariate "^-rearrangements: 

Proposition 2. Let the target function /o : X d — > K be weakly increasing and measur- 
able in x. Let f : X d — > K be a measurable function that is an initial estimate of the 



target function fo- Let f : X d — > K be another estimate of fo, which is measurable in x, 
including, for example, a rearranged f with respect to some of the arguments. Then, 

1. For each ordering n of 1, ...,d, the it -rearranged estimate f*(x) is weakly increasing 
in x. Moreover, f*(x), an average of n -rearranged estimates, is weakly increasing in x. 

2. (a) For any j in l,...,d and any p in [l,oo], the rearrangement of f with respect 
to the j-th argument produces a weak reduction in the approximation error: 



X d 



\?Ax) - f {x)\*dx 



i/p 



< 



x d 



\f(x)-f (x)\ p dx 



l/p 



(2.5) 



(b) Consequently, a i\ -rearranged estimate f*(x) of f(x) weakly reduces the approxi- 
mation error of the original estimate: 



\f:( X ) - f ( X )\ p dx 



i/p 



< 



x d 



\f{x)-f Q {x)\*dx 



l/p 



(2.6) 



3. Suppose that f{x) and fo(x) have the following properties: there exist subsets 
Xj C X and Xj C X , each of measure 5 > 0, and a subset X_j C X d ~ x , of measure 
v > 0, such that for all x = (xj,X-j) and x' = (x'j,x_j), with x'j G Xj, Xj G Xj, 
X-j G X-j, we have that (i) x'j > Xj, (ii) f(x) > f(x') + e, and (Hi) fo(x') > fo(x) + e, 
for some e > 0. 

(a) Then, for any p G [1, oo], 



I 

Jx d 



\f*(x)-f (x)\*>dx 



l/p 



< 



X d 



\f{x) - fo{x)\ p dx - rjpSu 



i/p 



(2.7) 



where f] p = inf{|f — t'\ p + \v' — t\ p — \v — t\ p — \v' — t'\ p }, and i] p > for p G (1, oo), with 
the infimum taken over all v, v', t, t' in the set K such that v' > v + e and t' >t + e. 

(b) Further, for an ordering n = (7Ti, 7r fc , n d ) with n k = j, let f be a partially 
rearranged function, f(x) = R Wk+1 ° ••• ° Rw d ° f(x) (for k = d we set f(x) = f(x)). If 
the function f(x) and the target function fo(x) satisfy the condition stated above, then, 
for any p G [1, oo], 



\f:( X ) - f (x)\ p dx 



i/p 



< 



\f{x) - fo{x)\ p dx - rjpSu 



i/p 



(2.8) 
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4- The approximation error of an average rearrangement is weakly smaller than the 
average approximation error of the individual ir- rearrangements: For any p e [1, oo] ; 



\f*(x)-f (x)\ p dx 



i/p 



- n ^ 

1 1 ?ren 



\f;(x) - f (x)fdx 



1/p 



(2.9) 



This proposition generalizes the results of Proposition 1 to the multivariate case, 
also demonstrating several features unique of the multivariate case. We see that the 
7r-rearranged functions are monotonic in all of the arguments. The rearrangement along 
any argument improves the approximation properties of the estimate. Moreover, the 
improvement is strict when the rearrangement with respect to a j-th argument is per- 
formed on an estimate that is decreasing in the j-th argument, while the target function 
is increasing in the same j-th argument, in the sense precisely defined in the proposition. 
Moreover, averaging different 7r-rearrangements is better (on average) than using a single 
7r-rearrangement chosen at random. All other basic implications of the proposition are 
similar to those discussed for the univariate case. 



3. Proofs of Propositions 

3.1. Proof of Proposition 1. The first part establishes the weak inequality, following 
in part the strategy in Lorentz's (1953) proof. The proof focuses directly on obtaining 
the result stated in the proposition. The second part establishes the strong inequality. 

Proof of Part 1. We assume at first that the functions /(•) and /o(') are simple 
functions, constant on intervals ((s — l)/r, s/r], s = 1, ...,r. For any simple /(•) with r 
steps, let / denote the r- vector with the s-th element, denoted f s , equal to the value of 
/(•) on the s-th interval. Let us define the sorting operator S(f) as follows: Let £ be an 
integer in 1, r such that fe > f m for some m > I. If £ does not exist, set S(f) = f. If 
£ exists, set S(f) to be a r-vector with the £-th element equal to f m , the m-th element 
equal to fe, and all other elements equal to the corresponding elements of /. For any 
sub modular function L : M 2 — > R + , by fe > f m , f 0m > f e and the definition of the 
submodularity, 



L(f m , foe) + L(fe, fo m ) < L(fe, foe) + L(f m , /o m ). 
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Therefore, we conclude that 

/ L(S(f)(x)J (x))dx< [ L(f(x)J (x))dx, (3.1) 

J X J X 

using that we integrate simple functions. 

Applying the sorting operation a sufficient finite number of times to /, we obtain a 
completely sorted, that is, rearranged, vector /*. Thus, we can express /* as a finite 
composition /* = S o ... o S(f) . By repeating the argument above, each composition 
weakly reduces the approximation error. Therefore, 

/ L(f*(x),f (x))dx< [ L(^.^(/),/ (a:))dx< / L(f(x),f (x))dx. (3.2) 
Jx Jx T -- ^ Jx 

nmte times 

Furthermore, this inequality is extended to general measurable functions /(•) and /o(-) 
mapping X to K by taking a sequence of bounded simple functions f^(-) and /q (•) 
converging to /(•) and /o(-) almost everywhere as r — > oo. The almost everywhere 
convergence of f^(-) to /(•) implies the almost everywhere convergence of its quantile 
function /*^(-) to the quantile function of the limit, /*(■). Since inequality (I3.2p holds 
along the sequence, the dominated convergence theorem implies that (13. 2ft also holds for 
the general case. □ 

Proof of Part 2. Let us first consider the case of simple functions, as defined in Part 
1. We take the functions to satisfy the following hypotheses: there exist regions X and 
Xq, each of measure greater than 8 > 0, such that for all x e X and x' G Xq, we have 
that (i) x' > x, (ii) f(x) > f(x') + e, and (iii) fo(x') > fo(x) + e, for e > specified in 
the proposition. For any strictly submodular function L : R 2 — > M + we have that 

t) = mf{L(v', t) + L(v, t') - L{v, t) - L(v', t')} > 0, 

where the infimum is taken over all v,v',t,t' in the set K such that v ' > v + e and 
t' >t + e. 

We can begin sorting by exchanging an element f(x), x e Xq, of r- vector / with an 
element f(x r ), x' G Xq, of r- vector /. This induces a sorting gain of at least rj times 1/r. 
The total mass of points that can be sorted in this way is at least 5. We then proceed to 
sort all of these points in this way, and then continue with the sorting of other points. 
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After the sorting is completed, the total gain from sorting is at least Srj. That is, 



L(f*(x),f (x))dx < / L(f(x), f (x))dx ~ 5r]. 

X J X 

We then extend this inequality to the general measurable functions exactly as in the 
proof of part one. □ 

3.2. Proof of Proposition 2. The proof consists of the following four parts. 

Proof of Part 1. We prove the claim by induction. The claim is true for d — 1 by 
f*(x) being a quantile function. We then consider any d > 2. Suppose the claim is 
true in d — 1 dimensions. If so, then the estimate f(xj, X-j), obtained from the original 
estimate f(x) after applying the rearrangement to all arguments X-j of x, except for the 
argument Xj, must be weakly increasing in x_j for each Xj. Thus, for any x'_^ > 
we have that 

f(Xj, x'_ 3 ) > f(Xj, x^) for X, ~ U(0, 1). (3.3) 

Therefore, the random variable on the left of (13.31) dominates the random variable on 
the right of (I3.3P in the stochastic sense. Therefore, the quantile function of the random 
variable on the left dominates the quantile function of the random variable on the right, 
namely 

f*(xj, x'_j) > f*(xj, x-j) for each Xj G X = (0, 1). (3.4) 

Moreover, for each the function Xj i— > f*(xj,x_j) is weakly increasing by virtue of 
being a quantile function. We conclude therefore that x t— > fj{x) is weakly increasing 
in all of its arguments at all points x G X . The claim of Part 1 of the Proposition now 
follows by induction. □ 

Proof of Part 2 (a). By Proposition 1, we have that for each 

fj (xj, X—j) foi^j i j) I dxj ^ / \f{xj 1 X—j) /o(xj,x_j)| dxj. (3-5) 



j 

X J X 



Now, the claim follows by integrating with respect to X-j and taking the p-th root of 
both sides. For p = oo, the claim follows by taking the limit as p — > oo. □ 

Proof of Part 2 (b). We first apply the inequality of Part 2(a) to f{x) = f(x), then 
to f(x) = R Kd o f(x), then to f(x) = ° Rn d ° f( x ), an d so on. In doing so, 

we recursively generate a sequence of weak inequalities that imply the inequality (12.61) 
stated in the Proposition. □ 
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Proof of Part 3 (a). For each X-j G X d ~ x \ X-j, by Part 2(a), we have the weak 
inequality (13.51) . and for each G X-j, by the inequality for the univariate case stated 
in Proposition 1 Part 2, we have the strong inequality 



x 



fj( x j' x -j) ~ fo(xj,X-j)\ dXj < / IfixjiX-j) - / (%,^-i) dxj-ripd, (3.6) 



x 



where t] p is defined in the same way as in Proposition 1. Integrating the weak inequality 
( 13. 51) over X-j G X 1 \ X_j, of measure 1 — u, and the strong inequality ( 13. 61) over X-j, 
of measure z/, we obtain 



x d 



\fj( x ) ~ fo( x )\ dx< \f(x) - f (x)\ dx - r} p Sv. 

Jx d 



(3.7) 



The claim now follows. □ 

Proof of Part 3 (b). As in Part 2(a), we can recursively obtain a sequence of weak 
inequalities describing the improvements in approximation error from rearranging se- 
quentially with respect to the individual arguments. Moreover, at least one of the 
inequalities can be strengthened to be of the form stated in (13.71) . from the assumption 
of the claim. The resulting system of inequalities yields the inequality (I2.8p . stated in 
the proposition. □ 

Proof of Part 4. We can write 

r . v iVp 

/ f*{x)-f {x) d, 



< 



x 



fo(x) 







p 


\I 


f:( x )-Mx) 


dx 


Jx d 







dx 
i/p 



i/p 



(3i 



where the last inequality follows by pulling out 1 / 1 TI | and then applying the triangle 
inequality for the L p norm. □ 



4. Illustrations 

In this section we provide an empirical application to biometric age-height charts. 
We show how the rearrangement monotonizes and improves various nonparametric esti- 
mates, and then we quantify the improvement in a simulation example that mimics the 
empirical application. 
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4.1. An Empirical Illustration with Age-Height Reference Charts. Since their 
introduction by Quetelet in the 19th century, reference growth charts have become com- 
mon tools to asses an individual's health status. These charts describe the evolution 
of individual anthropometric measures, such as height, weight, and body mass index, 
across different ages. See Cole (1988) for a classical work on the subject and Wei, Pere, 
Koenker, and He (2006) for a recent analysis from a quantile regression perspective and 
additional references. 

To illustrate the properties of the rearrangement method we consider the estimation 
of growth charts for height. It is clear that height should naturally follow an increasing 
relationship with age. Our data consist of repeated cross sectional measurements of 
height and age from the 2003-2004 National Health and Nutrition Survey collected by 
the National Center for Health Statistics. Height is measured as standing height in 
centimeters, and age is recorded in months and expressed in years. To avoid confounding 
factors that might affect the relationship between age and height, we restrict the sample 
to US-born white males age two through twenty. Our final sample consists of 533 subjects 
almost evenly distributed across these ages. 

Let Y and X denote height and age, respectively. Let = x] denote the condi- 

tional expectation of Y given X — x, and Qy(u\X = x) denote the u-th. quantile of Y 
given X = x, where u is the quantile index. The population functions of interests are 
(1) the conditional expectation function (CEF), (2) the conditional quantile functions 
(CQF) for several quantile indices (5%, median, and 95%), and (3) the entire condi- 
tional quantile process (CQP) for height given age. In the first case, the target function 
x i — y fo{x) is x i — i?[F|X = x]; in the second case, the target function x i— > fo(x) is 
x i— > Qy[u\X = x], for u = 5%, 50%, and 95%; and, in the third case, the target func- 
tion (u,x) i— > fo(u,x) is (u,x) I— > <5y[ii|JT = x\. The natural monotonicity requirements 
for the target functions are the following: The CEF x i— > _E[F|X = x] and the CQF 
x i— > Qy(u\X = x) should be increasing in age x, and the CQP (u,x) i— > Qy[u\X = x] 
should be increasing in both age x and the quantile index u. 

We estimate the target functions using non-parametric ordinary least squares or quan- 
tile regression techniques and then rearrange the estimates to satisfy the monotonicity 
requirements. We consider (a) kernel, (b) local linear, (c) spline, (d) global polynomial, 
(e) Fourier, and (f) flexible Fourier methods. For the kernel method, we provide a fit 
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on a cell-by-cell basis, with each cell corresponding to one month. For the local linear 
method, we choose a bandwidth of one year and a box kernel. For the spline method, 
we use cubic B-splines with a knot sequence {3, 5, 8, 10, 11.5, 13, 14.5, 16, 18}, following 
Wei, Pere, Koenker, and He (2006). For the global polynomial method, we fit a quartic 
polynomial. For the Fourier method, we employ eight trigonometric terms, with four 
sines and four cosines. For the flexible Fourier method, we use a quadratic polynomial 
and four trigonometric terms, with two sines and two cosines. Finally, for the estima- 
tion of the conditional quantile process, we use a net of two hundred quantile indices 
{0.005, 0.010, 0.995}. In the choice of the parameters for the different methods, we 
select values that either have been used in the previous empirical work or give rise to 
specifications with similar complexities for the different methods. 

The panels A-F of Figure [T] show the original and rearranged estimates of the con- 
ditional expectation function for the different methods. All the estimated curves have 
trouble capturing the slowdown in the growth of height after age sixteen and yield non- 
monotonic curves for the highest values of age. The Fourier series have a special difficulty 
approximating the aperiodic age-height relationship. The rearranged estimates correct 
the non-monotonicity of the original estimates, providing weakly increasing curves that 
coincide with the original estimates in the parts where the latter are monotonic. More- 
over, the rearranged estimates necessarily improve upon the original estimates, since, 
by the theoretical results derived earlier, they are closer to the true functions than the 
original estimates. We quantify this improvement in the next subsection. 

Figure [2] displays similar but more pronounced non-monotonicity patterns for the 
estimates of the conditional quantile functions. The rearrangement again performs well 
in delivering curves that improve upon the original estimates and that satisfy the natural 
monotonicity requirement. 

Figures [3JTT] illustrate the multivariate rearrangement of the conditional quantile pro- 
cess (CQP) along both the age and the quantile index arguments. We plot in three 
dimensions the original estimate, its age rearrangement, its quantile rearrangement, and 
its average multivariate rearrangement (the average of the age-quantile and quantile-age 
rearrangements). We also plot the corresponding contour surfaces. (Here, we do not 
show the multivariate age-quantile and quantile-age rearrangements separately, because 
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they are very similar to the average multivariate rearrangement.) We see from the con- 
tour plots that, for all of the estimation methods considered, the estimated CQP is 
non-monotone in age and non-monotone in the quantile index at extremal values of this 
index. The contour plots for the estimates based on the Fourier series best illustrate the 
non-monotonicity problem. We see that the average multivarite rearrangement fixes the 
non-monotonicity problem, and delivers an estimate of the CQP that is monotone in 
both the age and the quantile index arguments. Furthermore, by the theoretical results 
of the paper, the multivariate rearranged estimates necessarily improve upon the original 
estimates. 

4.2. Monte-Carlo Illustration. The following Monte Carlo experiment quantifies the 
improvement in the estimation/approximation properties of the rearranged estimates 
relative to the original estimates. The experiment closely matches the empirical appli- 
cation presented above. 

Specifically, we consider the design where the outcome variable Y equals a location 
function plus a disturbance e, Y = Z(X)'/3 + e, and the disturbance is independent of the 
regressor X. The vector Z(X) includes a constant and a piecewise linear transformation 
of the regressor X with three changes of slope, namely Z(X) = (1,X, 1{X > 5} • (X — 
5), 1{X > 10} • (X - 10), 1{X > 15} • (X - 15)). This design implies the conditional 
expectation function 



We select the parameters of the design to match the empirical example of growth charts 
in the previous subsection. Thus, we set the parameter (3 equal to the ordinary least 
squares estimate obtained in the growth chart data, namely (71.25, 8.13, —2.72, 1.78, 
—6.43). This parameter value and the location specification (14. 2 p imply a model for CEF 
and CQP that is monotone in age over the range of 2-20. To generate the values of the 
dependent variable, we draw disturbances from a normal distribution with the mean and 
variance equal to the mean and variance of the estimated residuals, e = Y — Z(X)'/3, 
in the growth chart data. We fix the regressor X in all of the replications to be the 



E[Y\X] = Z(X)'/3, 




and the conditional quantile function 
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observed values of age in the growth chart data set. In each replication, we estimate the 
CEF and CQP using the nonparametric methods described in the previous section. 

In Table 1 we report the average L p errors (for p — 1,2,3,4 and oo) for the original 
estimates and the rearranged estimates of the CEF. We also report the relative efficiency 
of the two estimates, measured as the ratio of the average error of the rearranged estimate 
to the average error of the original estimate. We calculate the average L p error as the 
Monte Carlo average of 



where the target function f (x) is the CEF E[F|X = x], and the estimate f(x) denotes 
either the original nonparametric estimate of the CEF or its increasing rearrangement. 
For all of the methods considered, we find that the rearranged curves estimate the true 
CEF more accurately than the original curves, providing a 2% to 84% reduction in the 
average error, depending on the method and the norm (i.e. values of p). 

In Table 2 we report the average L p errors for the original estimates of the conditional 
quantile process and their multivariate rearrangement with respect to the age and quan- 
tile index arguments. We also report the ratio of the average error of the rearranged 
estimate to the average error of the original estimate. The average L p error is the Monte 
Carlo average of 



where the target function f (u,x) is the conditional quantile process Qy(u\X = x), and 
the estimate f(u, x) denotes either the original nonparametric estimate of the conditional 
quantile process or its multivariate rearrangement. We present the results for the av- 
erage multivariate rearrangement only. The age-quantile and quantile-age multivariate 
rearrangements give errors that are very similar to their average multivariate rearrange- 
ment, and we therefore do not report them separately. For all the methods considered, 
we find that the multivariate rearranged curves estimate the true CQP more accurately 
than the original curves, providing a 4% to 74% reduction in the approximation error, 
depending on the method and the norm. 
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In Table 3 we report the average L p error for the univariate rearrangements of the 
conditional quantile function along either the age argument or the quantile index ar- 
gument. We also report the ratio of the average error for these rearrangements to the 
average error of the original estimates. For all of the methods considered, we find that 
these rearranged curves estimate the true CQP more accurately than the original curves, 
providing noticeable reductions in the estimation error. Moreover, in this case the re- 
arrangement along the age argument is more effective in reducing the estimation error 
than the rearrangement along the quantile index. Furthermore, by comparing Tables 2 
and 3, we also see that the multivariate rearrangement provides an improvement over 
the individual univariate rearrangements, yielding estimates of the CQP that are often 
much closer to the true process. 

5. Conclusion 

Suppose that a target function is known to be weakly increasing, and we have an 
original estimate of this function, which is not weakly increasing. Common estima- 
tion methods provide estimates with such a property. We show that these estimates 
can always be improved using rearrangement techniques. The univariate and multivari- 
ate rearrangement methods transform the original estimate to a monotonic estimate. 
The resulting monotonic estimate is in fact closer to the target function in common 
metrics than the original estimate. We illustrate these theoretical results with a com- 
putational example and an empirical example, dealing with estimation of conditional 
mean and quantile functions of height given age. The rearrangement both monotonizes 
and improves the original non-monotone estimates. It would be interesting to determine 
whether this improved estimation/approximation property carries over to other methods 
of monotonization. We leave this extension for future research. 
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Table 1. L p Estimation/ Approximation Error of Original and Rear- 
ranged Estimates of the Conditional Expectation Function, for p = 
1,2,3,4, and oo. Univariate Rearrangement. 



P L Q L R L r/ L Q L Q L R L r/ L Q 







A. Kernel 




B. Local Polynomial 


1 


3.69 


1.33 


0.36 


0.79 


0.76 


0.96 


2 


4.80 


1.84 


0.38 


1 nn 
l.UU 


0.96 


0.96 


3 


5.81 


2.46 


0.42 


1 IT 

1.17 


1.13 


0.96 


4 


6.72 


3.12 


0.46 


1.33 


1.28 


0.96 


oo 


16.8 


9.84 


0.58 


2.96 


2.81 


0.95 






C. Splines 






D. Quartic 




1 


0.87 


0.81 


0.93 


1.33 


1.19 


0.89 


2 


1.10 


1.02 


0.93 


1.64 


1.46 


0.89 


3 


1.31 


1.22 


0.93 


1.89 


1.68 


0.89 


4 


1.52 


1.39 


0.92 


2.10 


1.87 


0.89 


oo 


3.72 


3.19 


0.86 


4.38 


3.79 


0.87 






E. Fourier 




f. : 


Flexible Fourier 


1 


6.57 


3.21 


0.49 


0.73 


0.72 


0.97 


2 


10.7 


3.79 


0.35 


0.91 


0.89 


0.97 


3 


15.2 


4.24 


0.28 


1.06 


1.04 


0.98 


4 


19.0 


4.59 


0.24 


1.18 


1.16 


0.98 


oo 


48.9 


7.79 


0.16 


2.44 


2.40 


0.98 



Notes: The table is based on 10,000 replications. 
Lq is the L p error of the original estimate. 
L P R is the L p error of the rearranged estimate. 



Table 2. LP Estimation/Approximation Error of Original and Rear- 
ranged Estimates of the Conditional Quantile Process, for p = 1,2,3,4, 
and oo. Average Multivariate Rearrangement. 



V 


o 


L rr l rr/ l o 


o 


L RR L Rr/ L 






A. Kernel 




B. 


Local Polynomial 


1 


5.35 


Q 1 Q 

o.lo 


U.oo 


1.21 


1.09 


0.91 


2 


6.97 


A Q7 


U.Oo 


1.61 


1.46 


0.91 


3 


8.40 


^ AO 


u.oo 


2.03 


1.84 


0.91 


4 


9.72 


R /1Q 

D.4y 


U.O ( 


2.48 


2.24 


0.91 


oo 


34.3 


OP. A 
ZD. 4 


n 77 

U. ( ( 


12.3 


10.4 


0.84 






C. Splines 






D. Quartic 




1 


1.33 


1.20 


0.90 


1.49 


1.35 


0.90 


2 


1.78 


1.60 


0.90 


1.87 


1.69 


0.90 


3 


2.30 


2.03 


0.88 


2.23 


1.99 


0.89 


4 


2.92 


2.50 


0.86 


2.62 


2.29 


0.87 


oo 


16.9 


12.1 


0.72 


12.6 


8.61 


0.68 






E. Fourier 




F. 


Flexible Fourier 


1 


6.72 


4.18 


0.62 


1.05 


1.00 


0.96 


2 


13.7 


5.35 


0.39 


1.38 


1.31 


0.95 


3 


20.8 


6.36 


0.31 


1.72 


1.63 


0.95 


4 


26.7 


7.25 


0.27 


2.12 


1.98 


0.94 


oo 


84.9 


21.9 


0.26 


10.9 


9.13 


0.84 



Notes: The table is based on 1,000 replications. 
L p is the LP error of the original estimate. 

L P RR is the L p error of the average multivariate rearranged estimate. 
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Table 3. L p Estimation/Approximation Error of Rearranged Estimates 
of the Conditional Quantile Process, for p = 1,2,3,4, oo. Univariate Re- 
arrangements. 



P L R U L R X L R U / L Q L rJ L Q L R u L R x L R u / L Q L rJ L Q 

A. Kernel B. Local Polynomial 



1 


5.35 


3.13 


1.00 


0.58 


1.20 


1.10 


1.00 


0.91 


2 


6.97 


4.37 


1.00 


0.63 


1.60 


1.47 


1.00 


0.91 


3 


8.40 


5.49 


1.00 


0.65 


2.01 


1.85 


0.99 


0.91 


4 


9.72 


6.49 


1.00 


0.67 


2.45 


2.26 


0.99 


0.91 


oo 


34.3 


26.4 


1.00 


0.77 


11.8 


10.8 


0.96 


0.88 






C. Splines 






D. 


Quartic 




1 


1.31 


1.21 


0.99 


0.91 


1.49 


1.35 


1.00 


0.91 


2 


1.75 


1.63 


0.98 


0.91 


1.87 


1.69 


1.00 


0.90 


3 


2.24 


2.08 


0.97 


0.90 


2.22 


2.00 


0.99 


0.90 


4 


2.80 


2.59 


0.96 


0.89 


2.60 


2.30 


0.99 


0.88 


oo 


14.4 


13.9 


0.85 


0.82 


11.9 


9.11 


0.95 


0.72 






E. Fourier 






F. Flexible Fourier 




1 


6.71 


4.19 


1.00 


0.62 


1.04 


1.01 


0.99 


0.96 


2 


13.7 


5.36 


1.00 


0.39 


1.36 


1.32 


0.99 


0.96 


3 


20.8 


6.37 


1.00 


0.31 


1.70 


1.65 


0.99 


0.96 


4 


26.7 


7.26 


1.00 


0.27 


2.08 


2.02 


0.98 


0.95 


oo 


84.9 


22.2 


1.00 


0.26 


10.0 


9.86 


0.92 


0.91 



Notes. The table is based on 1,000 replications. 
Lq is the LP error of the original estimate. 
L R is the LP error of the estimate rearranged in age x. 
LP R is the LP error of the estimate rearranged in the quantile index u. 
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A. CEF (Kernel) 



B. CEF (Local Pol.) 




Age 

C. CEF (Splines) 








Original 
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Figure 1. Nonparametric estimates of the Conditional Expectation 
Function (CEF) of height given age and their increasing rearrangements. 
Nonparametric estimates are obtained using kernel regression (A), locally 
linear regression (B), cubic B-splines series (C), a four degree polynomial 
(D), Fourier series (E), and flexible Fourier series (F). 
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A. CQF: 5%, 50%, 95% (Kernel) B. CQF: 5%, 50%, 95% (Local Pol.) 




5 10 15 20 5 10 15 20 

Age Age 



Figure 2. Nonparametric estimates of the 5%, 50%, and 95% Condi- 
tional Quantile Functions (CQF) of height given age and their increasing 
rearrangements. Nonparametric estimates are obtained using kernel re- 
gression (A), locally linear regression (B), cubic B-splines series (C), a 
four degree polynomial (D), Fourier series (E), and flexible Fourier series 
(F). 
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Figure 3. Kernel estimates of the Conditional Quantile Process (CQP) 
of height given age and their increasing rearrangements. Panels C and 
E plot the one dimensional increasing rearrangement along the age and 
quantile dimension respectively; panel G shows the average multivariate 
rearrangement. 
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Figure 4. Locally linear estimates of the Conditional Quantile Process 
(CQP) of height given age and their increasing rearrangements. Panels C 
and E plot the one dimensional increasing rearrangement along the age and 
quantile dimension respectively; panel G shows the average multivariate 
rearrangement. 
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Figure 5. Cubic B-splines series estimates of the Conditional Quantile 
Process (CQP) of height given age and their increasing rearrangements. 
Panels C and E plot the one dimensional increasing rearrangement along 
the age and quantile dimension respectively; panel G shows the average 
multivariate rearrangement. 
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Figure 6. Quartic polynomial series estimates of the Conditional Quan- 
tile Process (CQP) of height given age and their increasing rearrange- 
ments. Panels C and E plot the one dimensional increasing rearrange- 
ment along the age and quantile dimension respectively; panel G shows 
the average multivariate rearrangement. 
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Figure 7. Fourier series estimates of the Conditional Quantile Process 
(CQP) of height given age and their increasing rearrangements. Panels C 
and E plot the one dimensional increasing rearrangement along the age and 
quantile dimension respectively; panel G shows the average multivariate 
rearrangement. 
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Figure 8. Flexible Fourier form series estimates of the Conditional 
Quantile Process (CQP) of height given age and their increasing rearrange- 
ments. Panels C and E plot the one dimensional increasing rearrangement 
along the age and quantile dimension respectively; panel G shows the av- 
erage multivariate rearrangement. 



